Fault-Tolerant Shared Memory Simulations

نویسندگان

  • Petra Berenbrink
  • Friedhelm Meyer auf der Heide
  • Volker Stemann
چکیده

We consider the problem of simulating a PRAM on a faulty distributed memory machine (DMM). We focus on dynamic faults, i.e. each processor or memory module independently fails during the simulation of a PRAM step with fixed probability and remains faulty for the rest of the simulation. We build upon randomized hashing-based simulations on non-faulty DMMs from [14], which achieve delay O(log log n), with high probability. We design and analyze routines for handling faults occurring during the simulation. Based on these routines we present simulations on faulty DMMs with the same delayO(log log n) as in the non-faulty case, provided that the failure probability of processors and modules is small enough to guarantee an expected linear number of processors and modules to survive the simulation. Thus the facility of being resilient to memory or processor faults increases the delay of the simulation at most by a constant factor.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Peer Support to Reduce Fault-Tolerant Overhead in Distributed Shared Memories

We present a peer logging system for reducing performance overhead in fault-tolerant distributed shared memory systems. Our system provides fault-tolerant shared memory using individual checkpointing and rollback. Peer logging logs DSM modification messages to remote nodes instead of to local disks. We present results for implementations of our fault-tolerant technique using simulations of both...

متن کامل

Design and Analysis of a Dynamically Reconfigurable Shared Memory Cluster

In recent years, the clusters have become a viable and less expensive alternative to multiprocessor systems. This paper proposes an architecture with a load balancing and a fault tolerant model for shared memory clusters. A task clustering algorithm, a Centralized dynamic load balancing model, a load balancing algorithm and a fault tolerant model are proposed for shared memory clusters. The res...

متن کامل

Practical Schemes using Logs for Lightweight Recoverable DSM

In the existing Fault-Tolerant Software Distributed Shared Memory (FT-SDSM) with the message logging, the logs are used only to recover the failed nodes. In our previous work, we have implemented a lightweight logging protocol, called remote logging, on the SDSM for fault tolerance, which incurs low logging overhead with a fast network and a remote memory for back-up data. In this paper, we pro...

متن کامل

Fault Tolerance and Performance of Multipath Multistage Interconnection Networks

In building a multiprocessor system, we can minimize the system's mean time to failure by providing an architecture resilient to component faults. We compare the fault tolerance and performance characteristics of various fault-tolerant multistage interconnection networks. We primarily focus on networks composed of dilated routing components. A dilated router features redundant outputs in each l...

متن کامل

A Hierarchical Shared Memory Cluster Architecture with Load Balancing and Fault Tolerance

Recently a great deal of attention has been paid to the design of hierarchical shared memory cluster system. Cluster computing has made hierarchical computing systems increasingly common as target environment for large-scale scientific computations. This paper proposes hierarchical shared memory cluster architecture with load balancing and fault tolerance. Hierarchies of shared memory and cache...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996